{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Qwen Quick Start Notebook\n",
    "\n",
    "This notebook shows how to train and infer the Qwen-7B-Chat model on a single GPU. Similarly, Qwen-1.8B-Chat, Qwen-14B-Chat can also be leveraged for the following steps. We only need to modify the corresponding `model name` and hyper-parameters. The training and inference of Qwen-72B-Chat requires higher GPU requirements and larger disk space."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Requirements\n",
    "- Python 3.8 and above\n",
    "- Pytorch 1.12 and above, 2.0 and above are recommended\n",
    "- CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)\n",
    "We test the training of the model on an A10 GPU (24GB)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extra\n",
    "If you need to speed up, you can install  `flash-attention`. The details of the installation can be found [here](https://github.com/Dao-AILab/flash-attention)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!git clone https://github.com/Dao-AILab/flash-attention\n",
    "!cd flash-attention && pip install .\n",
    "# Below are optional. Installing them might be slow.\n",
    "# !pip install csrc/layer_norm\n",
    "# If the version of flash-attn is higher than 2.1.1, the following is not needed.\n",
    "# !pip install csrc/rotary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 0: Install Package Requirements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install transformers>=4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed modelscope"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 1: Download Model\n",
    "When using `transformers` in some regions, the model cannot be automatically downloaded due to network problems. We recommend using `modelscope` to download the model first, and then use `transformers` for inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from modelscope import snapshot_download\n",
    "\n",
    "# Downloading model checkpoint to a local dir model_dir.\n",
    "model_dir = snapshot_download('Qwen/Qwen-7B-Chat', cache_dir='.', revision='master')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Direct Model Inference \n",
    "We recommend two ways to do model inference: `modelscope` and `transformers`.\n",
    "\n",
    "#### 2.1 Model Inference with ModelScope"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from modelscope import AutoModelForCausalLM, AutoTokenizer\n",
    "from modelscope import GenerationConfig\n",
    "\n",
    "# Note: The default behavior now has injection attack prevention off.\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True)\n",
    "\n",
    "# use bf16\n",
    "# model = AutoModelForCausalLM.from_pretrained(\"qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, bf16=True).eval()\n",
    "# use fp16\n",
    "# model = AutoModelForCausalLM.from_pretrained(\"qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, fp16=True).eval()\n",
    "# use cpu only\n",
    "# model = AutoModelForCausalLM.from_pretrained(\"qwen/Qwen-7B-Chat/\", device_map=\"cpu\", trust_remote_code=True).eval()\n",
    "# use auto mode, automatically select precision based on the device.\n",
    "model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True).eval()\n",
    "\n",
    "# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.\n",
    "# model.generation_config = GenerationConfig.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参\n",
    "\n",
    "# 第一轮对话 1st dialogue turn\n",
    "response, history = model.chat(tokenizer, \"你好\", history=None)\n",
    "print(response)\n",
    "# 你好！很高兴为你提供帮助。\n",
    "\n",
    "# 第二轮对话 2nd dialogue turn\n",
    "response, history = model.chat(tokenizer, \"给我讲一个年轻人奋斗创业最终取得成功的故事。\", history=history)\n",
    "print(response)\n",
    "# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。\n",
    "# 故事的主人公叫李明，他来自一个普通的家庭，父母都是普通的工人。从小，李明就立下了一个目标：要成为一名成功的企业家。\n",
    "# 为了实现这个目标，李明勤奋学习，考上了大学。在大学期间，他积极参加各种创业比赛，获得了不少奖项。他还利用课余时间去实习，积累了宝贵的经验。\n",
    "# 毕业后，李明决定开始自己的创业之路。他开始寻找投资机会，但多次都被拒绝了。然而，他并没有放弃。他继续努力，不断改进自己的创业计划，并寻找新的投资机会。\n",
    "# 最终，李明成功地获得了一笔投资，开始了自己的创业之路。他成立了一家科技公司，专注于开发新型软件。在他的领导下，公司迅速发展起来，成为了一家成功的科技企业。\n",
    "# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险，不断学习和改进自己。他的成功也证明了，只要努力奋斗，任何人都有可能取得成功。\n",
    "\n",
    "# 第三轮对话 3rd dialogue turn\n",
    "response, history = model.chat(tokenizer, \"给这个故事起一个标题\", history=history)\n",
    "print(response)\n",
    "# 《奋斗创业：一个年轻人的成功之路》"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.2 Model Inference with transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "from transformers.generation import GenerationConfig\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True)\n",
    "\n",
    "# use bf16\n",
    "# model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, bf16=True).eval()\n",
    "# use fp16\n",
    "# model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"auto\", trust_remote_code=True, fp16=True).eval()\n",
    "# use cpu only\n",
    "# model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", device_map=\"cpu\", trust_remote_code=True).eval()\n",
    "# use auto mode, automatically select precision based on the device.\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    \"Qwen/Qwen-7B-Chat/\",\n",
    "    device_map=\"auto\",\n",
    "    trust_remote_code=True\n",
    ").eval()\n",
    "\n",
    "# Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.\n",
    "# model.generation_config = GenerationConfig.from_pretrained(\"Qwen/Qwen-7B-Chat/\", trust_remote_code=True)\n",
    "\n",
    "# 1st dialogue turn\n",
    "response, history = model.chat(tokenizer, \"你好\", history=None)\n",
    "print(response)\n",
    "# 你好！很高兴为你提供帮助。\n",
    "\n",
    "# 2nd dialogue turn\n",
    "response, history = model.chat(tokenizer, \"给我讲一个年轻人奋斗创业最终取得成功的故事。\", history=history)\n",
    "print(response)\n",
    "# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。\n",
    "# 故事的主人公叫李明，他来自一个普通的家庭，父母都是普通的工人。从小，李明就立下了一个目标：要成为一名成功的企业家。\n",
    "# 为了实现这个目标，李明勤奋学习，考上了大学。在大学期间，他积极参加各种创业比赛，获得了不少奖项。他还利用课余时间去实习，积累了宝贵的经验。\n",
    "# 毕业后，李明决定开始自己的创业之路。他开始寻找投资机会，但多次都被拒绝了。然而，他并没有放弃。他继续努力，不断改进自己的创业计划，并寻找新的投资机会。\n",
    "# 最终，李明成功地获得了一笔投资，开始了自己的创业之路。他成立了一家科技公司，专注于开发新型软件。在他的领导下，公司迅速发展起来，成为了一家成功的科技企业。\n",
    "# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险，不断学习和改进自己。他的成功也证明了，只要努力奋斗，任何人都有可能取得成功。\n",
    "\n",
    "# 3rd dialogue turn\n",
    "response, history = model.chat(tokenizer, \"给这个故事起一个标题\", history=history)\n",
    "print(response)\n",
    "# 《奋斗创业：一个年轻人的成功之路》"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: LoRA Fine-Tuning Model (Single GPU)\n",
    "\n",
    "#### 3.1 Download Example Training Data\n",
    "Download the data required for training; here, we provide a tiny dataset as an example. It is sampled from [Belle](https://github.com/LianjiaTech/BELLE)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/qwen_recipes/Belle_sampled_qwen.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can refer to this format to prepare the dataset. Below is a simple example list with 1 sample:\n",
    "\n",
    "```json\n",
    "[\n",
    "  {\n",
    "    \"id\": \"identity_0\",\n",
    "    \"conversations\": [\n",
    "      {\n",
    "        \"from\": \"user\",\n",
    "        \"value\": \"你好\"\n",
    "      },\n",
    "      {\n",
    "        \"from\": \"assistant\",\n",
    "        \"value\": \"我是一个语言模型，我叫通义千问。\"\n",
    "      }\n",
    "    ]\n",
    "  }\n",
    "]\n",
    "```\n",
    "\n",
    "You can also use multi-turn conversations as the training set. Here is a simple example:\n",
    "\n",
    "```json\n",
    "[\n",
    "  {\n",
    "    \"id\": \"identity_0\",\n",
    "    \"conversations\": [\n",
    "      {\n",
    "        \"from\": \"user\",\n",
    "        \"value\": \"你好\"\n",
    "      },\n",
    "      {\n",
    "        \"from\": \"assistant\",\n",
    "        \"value\": \"你好！我是一名AI助手，我叫通义千问，有需要请告诉我。\"\n",
    "      },\n",
    "      {\n",
    "        \"from\": \"user\",\n",
    "        \"value\": \"你都能做什么\"\n",
    "      },\n",
    "      {\n",
    "        \"from\": \"assistant\",\n",
    "        \"value\": \"我能做很多事情，包括但不限于回答各种领域的问题、提供实用建议和指导、进行多轮对话交流、文本生成等。\"\n",
    "      }\n",
    "    ]\n",
    "  }\n",
    "]\n",
    "```\n",
    "\n",
    "#### 3.2 Fine-Tune the Model\n",
    "\n",
    "You can directly run the prepared training script to fine-tune the model. Remember to check `model_name_or_path`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python ../finetune/deepspeed/finetune.py \\\n",
    "    --model_name_or_path \"Qwen/Qwen-7B-Chat/\"\\\n",
    "    --data_path  \"Belle_sampled_qwen.json\"\\\n",
    "    --bf16 \\\n",
    "    --output_dir \"output_qwen\" \\\n",
    "    --num_train_epochs 5 \\\n",
    "    --per_device_train_batch_size 1 \\\n",
    "    --per_device_eval_batch_size 1 \\\n",
    "    --gradient_accumulation_steps 16 \\\n",
    "    --evaluation_strategy \"no\" \\\n",
    "    --save_strategy \"steps\" \\\n",
    "    --save_steps 1000 \\\n",
    "    --save_total_limit 10 \\\n",
    "    --learning_rate 1e-5 \\\n",
    "    --weight_decay 0.1 \\\n",
    "    --adam_beta2 0.95 \\\n",
    "    --warmup_ratio 0.01 \\\n",
    "    --lr_scheduler_type \"cosine\" \\\n",
    "    --logging_steps 1 \\\n",
    "    --report_to \"none\" \\\n",
    "    --model_max_length 512 \\\n",
    "    --gradient_checkpointing \\\n",
    "    --lazy_preprocess \\\n",
    "    --use_lora"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 Merge Weights\n",
    "\n",
    "LoRA training only saves the adapter parameters. You can load the fine-tuned model and merge weights as shown below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoModelForCausalLM\n",
    "from peft import PeftModel\n",
    "import torch\n",
    "\n",
    "model = AutoModelForCausalLM.from_pretrained(\"Qwen/Qwen-7B-Chat/\", torch_dtype=torch.float16, device_map=\"auto\", trust_remote_code=True)\n",
    "model = PeftModel.from_pretrained(model, \"output_qwen/\")\n",
    "merged_model = model.merge_and_unload()\n",
    "merged_model.save_pretrained(\"output_qwen_merged\", max_shard_size=\"2048MB\", safe_serialization=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The tokenizer files are not saved in the new directory in this step. You can copy the tokenizer files or use the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\n",
    "    \"Qwen/Qwen-7B-Chat/\",\n",
    "    trust_remote_code=True\n",
    ")\n",
    "\n",
    "tokenizer.save_pretrained(\"output_qwen_merged\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.4 Test the Model\n",
    "\n",
    "After merging the weights, we can test the model as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "from transformers.generation import GenerationConfig\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"output_qwen_merged\", trust_remote_code=True)\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    \"output_qwen_merged\",\n",
    "    device_map=\"auto\",\n",
    "    trust_remote_code=True\n",
    ").eval()\n",
    "\n",
    "response, history = model.chat(tokenizer, \"你好\", history=None)\n",
    "print(response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "2d58e898dde0263bc564c6968b04150abacfd33eed9b19aaa8e45c040360e146"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
